Skip to main content

Incident Report: Stuck Collector on Mantle

Date: 2024-01-12
Time: 13:43 (GMT+3)
Duration: 5 hours

Description​

A stuck collector was detected on Mantle. The current block (43399676) and the last queried block (43379008) were recorded, indicating a significant lag in data collection.

Root Cause​

The issue was suspected to be related to an RPC issue. There were multiple alerts around the collectors being unable to get transactions for Mantle.

Impact​

The data collection for Mantle was delayed, causing a lag in the current block and the last queried block.

Timeline​

  • 13:43 - Abdel first noticed the stuck collector issue.
  • 13:54 - Aaron identified a potential RPC issue and related alerts.
  • 13:57 - Bedirhan noted Mantle node’s syncing issues and recommended using a public RPC URL with Reblok.
  • 18:19 - A fix by Vekil was deployed, allowing collectors to fall back to the public RPC on a per-query basis.

Lessons Learned​

The incident highlighted the need for flexible data collection methods and the importance of having fallback mechanisms in place for RPC issues.

Actions Taken​

A fix was produced and deployed. All of the production collectors now fall back to the public RPC on a per query basis. The functionality to support multiple providers in the data collectors was added.

Escalation link.

Incident Reviewer(s)​

Abdel, Bedirhan, Aaron